I was unable to include this in the knit final version of my portfolio since the code blocks would not allow me to.
I was unable to include this the knit final version of my portfolio since the code blocks would not allow me to.
Describe the numerical abundance of microbial life in relation to ecology and biogeochemistry of Earth systems.
What were the main questions being asked?
What is the population size, density, and distribution of prokaryotes on earth? What quantity of essential nutrients, specfically carbon, are held by prokaryotes?
What were the primary methodological approaches used?
For aquatic and soil samples used direct counts. For unconsolidated sediments cited previous results. For terrestrial subsurface assumed porosity and volume of cell, and used groundwater data.
Summarize the main results or findings.
Most prokaryotes reside in seawater, soil, and sediment/soil subsurface. The prokaryote population contains large amounts of carbon, nitrogen, phosphorus, and other essential nutrients. The large population size of prokaryotes also suggests that there exist many events occuring in nature that we do not yet understand.
Do new questions arise from the results?
Prokaryotes are alarmingly abundant, play a role in important nutrient cycles, and experience a high mutation rate. This leads us to wonder what the true consequence of prokaryotic life on earth is.
Were there any specific challenges or advantages in understanding the paper (e.g. did the authors provide sufficient background information to understand experimental logic, were methods explained adequately, were any specific assumptions made, were conclusions justified based on the evidence, were the figures or tables useful and easy to understand)?
The paper contained a significant amount of evidence, that was presented in a meaningful way. The assumptions and conclusions were justified using this evidence. However, not all experimental methods were adequately explained. The paper could also use more structure in order to increase readability.
Comment on the emergence of microbial life and the evolution of Earth systems.
4.6Ga - moon formation, oldest dated zircon minerals found
4.1Ga - late heavy bombardment
3.8Ga - geological evidence for life on earth; photosynthetic fractionation by Rubisco
3.0Ga - production of oxygen by cyanobacteria decreases greenhouse gas effect; life on land; glaciation
2.6Ga - moelcular fossil studies show cyanobacteria and eukaryotes present
2.3Ga - evidence of redbed rock formation
1.7Ga - begins with glaciation and the emergence of eukaryotes, ends with another global glaciation
1.0Ga - animals and land plants emerge; followed by gigantism
Haden - Water vapour is high in the atmosphere and lot’s is lost to space. Rock vapour also present in the atmosphere. Seawater chemistry is controlled by volcanism and reactions with debris. The sun is faint meaning there was either global glaciation or warmth on earth was sustained by the high greenhouse gas content.
Archaean - Life on earth is now widespread (chemotrophic life). Earth is hot and active with volcanism. Sulphate in hydrothermal systems provides an oxygenation power.
Proterozoic - There is a rapid rise in oxygen as a result of the rise in biological activity. New rock formations and mineral types arise.
Phanerozoic - Life on earth is visible. Land plants carry out photosynthesis. The Earth has assumed its present configuration.
Evaluate human impacts on the ecology and biogeochemistry of Earth systems.
What were the main questions being asked?
Identify major earth system processes for which, if certain thresholds are crossed, a shift into a new system state could occur.
What were the primary methadological approaches used?
This paper cited existing data instead of conducting an actual experiment.
Summarize the main results or findings.
Conclude that there are nine earth processes which need to have defined boundaries which are largely interdependent (climate change, rate of biodiversity loss, interference with nitrogen and phosphorus cycles, stratospheric ozone depletion, ocean acidification, global freshwater use, change in land use, chemical pollution, atmospheric aerosol loading. There is evidence that three of these processes are already moving out of the Holocene state, since the current parameters are near or above the proposed boundaries.
Do new questions arise from the results?
The authors themselves acknowledge some of the gaps in the proposition. For example, that they have been using some of their “first best guesses” in defining boundaries. They also acknowledge that the interactions between boundaries are also not understood.
Were there any specific challenges or advantages in understanding the paper (e.g. did the authors provide sufficient background information to understand experimental logic, were methods explained adequately, were any specific assumptions made, were conclusions justified based on the evidence, were the figures or tables useful and easy to understand)?
The concept presented was unique and intriguing. There was evidence provided regarding the current states of planetary boundaries. However, the relationship between the state of these boundaries and the proposed shift into the proposed “Anthropocene” was not adequately justified. It wasn’t well explained how the authors came up with the boundaries, nor why reaching the boundaries was notable.
Describe the numerical abundance of microbial life in relation to the ecology and biogeochemistry of Earth systems.
What are the primary prokaryotic habitats on Earth and how do they vary with respect to their capacity to support life? Provide a breakdown of total cell abundance for each primary habitat from the tables provided in the text.
The primary prokaryotic habitats on Earth are aquatic, soil, and oceanic and terrestrial subsurface habitats. These contain 1.181x1020, 2.556x1029 and 3.8x1030 cells respectively.
What is the estimated prokaryotic cell abundance in the upper 200 m of the ocean and what fraction of this biomass is represented by marine cyanobacterium including Prochlorococcus? What is the significance of this ratio with respect to carbon cycling in the ocean and the atmospheric composition of the Earth?
Upper 200m of ocean contains total of 3.6x1028 cells. Of this, 2.9x1027 cells are marine cyanobacterium. This high ratio of cyanobacterium in the upper 200m of the ocean is significant because cyanobacterium are autotrophic, meaning they assimilate carbon and produce oxygen.
What is the difference between an autotroph, heterotroph, and a lithotroph based on information provided in the text?
An autotroph fixes inorganic carbon. Heterotrophs assimilate inorganic carbon. Lithotrophs use inorganic substrates.
Based on information provided in the text and your knowledge of geography what is the deepest habitat capable of supporting prokaryotic life? What is the primary limiting factor at this depth?
The deepest habitat capable of supporting prokaryotic life is at 3000m-4000m deep. The primary limiting factor at this depth is temperature.
Based on information provided in the text your knowledge of geography what is the highest habitat capable of supporting prokaryotic life? What is the primary limiting factor at this height?
The highest habitat capable of supporting prokaryotic life is at 55-77km high (perhaps an overly optimistic estimate). The primary limiting factor is likely access to other organisms and nutrients, or ionizing radiation.
Based on estimates of prokaryotic habitat limitation, what is the vertical distance of the Earth’s biosphere measured in km?
Estimated vertical distance of Earth’s biosphere is 58-81km.
How was annual cellular production of prokaryotes described in Table 7 column four determined? (Provide an example of the calculation)
Use estimates of prokaryotic carbon in environment to set limits for turnover rates. The population size multiplied by the yearly turnover should yield the annual cellular production.
For example:
(3.6x1028 cells) x (day/16) x (365day/yr) = 8.2x1029cells/year.
What is the relationship between carbon content, carbon assimilation efficiency and turnover rates in the upper 200m of the ocean? Why does this vary with depth in the ocean and between terrestrial and marine habitats?
Carbon assimilation will increase carbon content, and carbon turnover will decrease carbon content. Carbon content likely increases with depth in the ocean because detritus sinks. Assimilation efficiency and turnover rates probably decrease with depth because productivity decreases since sunlight is lacking, temperature decreases, and pressure increases. The amount of C in prokaryotes much higher in terrestrial versus marine habitats. However, the net primary productivity is higher in marine habitats.
How were the frequency numbers for four simultaneous mutations in shared genes determined for marine heterotrophs and marine autotrophs given an average mutation rate of 4 x 10-7 per DNA replication? (Provide an example of the calculation with units. Hint: cell and generation cancel out)
Example calculation for upper 200m heterotroph:
(3.6x1028 cells)x(22.8 turnover/year) = 8.2x1029 cells/year
(4x10-7 mutations/generation)4 = 2.56x10-26 mutations/generation
(8.2x1029 cells/year)x(2.56x10-26 mutations/generation) = 2.1x104 mutations/year
(8.7x103)/(2.1x104 mutations/year) = 0.4 mutations/hour
Given the large population size and high mutation rate of prokaryotic cells, what are the implications with respect to genetic diversity and adaptive potential? Are point mutations the only way in which microbial genomes diversify and adapt?
The large population size and high mutation rate results in an increase in genetic diversity and adaptive potential for prokaryotes. Some methods of diversification include point mutation, plasmid transfer, recombination, etc.
What relationships can be inferred between prokaryotic abundance, diversity, and metabolic potential based on the information provided in the text?
High abundance an rapid replication result in a high mutation rate. High mutation rate causes an increase in diversity of prokaryotes which, in turn, increases the metabolic potential.
Discuss the role of microbial diversity and formation of coupled metabolism in driving global biogeochemical cycles.
What are the primary geophysical and biogeochemical processes that create and sustain conditions for life on Earth? How do abiotic versus biotic processes vary with respect to matter and energy transformation and how are they interconnected?
Primary geophysical processes are based on tectonics and atmospheric photochemical processes. The high level of hydrogen in Earth’s atmosphere results in the majority of these abiotic processes being based on acid base chemistry. Acid base reactions supply and remove substances to create cycling of nutrients. The primary biogeochemical processes are abiotic acid base reactions and biotic redox reactions both based on cycling of H, C, N, O, S, and P. All the processes connect to create balance in elements and processes on earth.
Why is Earth’s redox state considered an emergent property?
Earth’s redox state is considered an emergent property since it has emerged as a result of interactions between microbial metabolic redox reactions and abiotic geochemical processes.
How do reversible electron transfer reactions give rise to element and nutrient cycles at different ecological scales? What strategies do microbes use to overcome thermodynamic barriers to reversible electron flow?
Reversible electron transfer reactions of elements and nutrients are fundamentally globally interconnected. The biosphere attains a self-sustaining state in different ecological scales as a result of coupled nutrient cycling. In order to overcome thermodynamic barriers to reversible electron flow, microbial metabolic pathways have evolved to use products of one cycle as substrates in another.
Using information provided in the text, describe how the nitrogen cycle partitions between different redox niches and microbial groups. Is there a relationship between the nitrogen cycle and climate change?
The nitrogen cycle partitions between different redox niches and microbial groups based on the availability of oxygen and organic matter. In the presence of oxygen, a specific group of Bacteria or Archaea perform the first oxygenation. The second oxidation is performed by a different group of nitrifying bacteria. In the absence of oxygen, a third set of microbes will use nitrogen dioxide and nitrate as electron acceptors to close the cycle. Nitrous oxide is potent greenhouse gas involved in the nitrogen cycle, which may effect climate change.
What is the relationship between microbial diversity and metabolic diversity and how does this relate to the discovery of new protein families from microbial community genomes?
The more microbial genomes we sequence, the more unique protein families we discover. It is argued that the vast number of genomic information discovered recently suggests a “limitless evolutionary diversity in nature”. Most of the genetic diversity observed is contained within nonessential genes and environment specific genes. Essential genes involved in metabolism do not display the same level of diversity.
On what basis do the authors consider microbes the guardians of metabolism?
Microbes harbour core metabolic machinery that stands the test of time. They are purposed to “ferry” critical gene sets through extinction events, due to their inherent resilience.
Microbial life can life easily without humans, who themselves could not survive without the global catalysis and environmental transformations provided by microbes. Their sheer abundance, resilience and adaptability coupled with the vital reactions they conduct render them essential and not realistically replaceable. Specifically, microbes are responsible for the Earth’s atmospheric oxygen concentration and play a number of indispensable roles in the nitrogen cycle. In addition, without microbial life, humans would likely be driven into famine due to the resulting loss in food sources such as seafood, legumes, and ruminant mammals. Finally, this paper will describe how microbes will be integral components in re-establishing stability on earth, in the aftermath of industrialization.
The oldest evidence for life on earth comes from microbial fossils dated at 3.8Ga. Cyanobacteria are universally believed to be responsible for the Earth’s initial rise in atmosphere oxygen at 3Ga, ultimately leading to the emergence of plant and animal life forms (Nisbet 2001). Cyanobacteria and microscopic algae continue to supply oxygen to the atmosphere. Although plants are capable of conducting photosynthesis, respiration and decay negate the effects of photosynthesis by plants on the atmospheric oxygen concentration. On the other hand, marine photosynthesis by eukaryotic algae results in a net production of oxygen in the atmosphere since 0.1% of oceanic organic matter is buried in sediments (Kasting 2002). This is the oxygen that humans rely on for respiration. Establishing the oxygen content in the present atmosphere, marine eukaryotic algae is evidently indispensable to human life on earth. If microbes ceased to exist, humans would not be able to respire, and so would not survive.
In regards to the nitrogen cycle, microbes are likewise essential. Nitrogen fixing microbes are able to provide the activation energy needed to break nitrogen bonds using enzymes like nitrogenase. Humans require fixed nitrogen to use as building blocks for essential macromolecules, such as amino acids and nucleic acids. Therefore, we cannot survive without nitrogen fixing prokaryotic bacteria. Denitrifying bacteria are play a significant role in maintaining productivity in anoxic deep ocean environments. It is worth acknowledging that, in theory, certain nitrogen containing compounds in the nitrogen cycle could be reduced to ammonium at extremely high temperatures and in the presence of certain forms of iron. If such a reaction were to be actively taking place on Earth, one would anticipate it occurred in hydrothermal vent systems. However, no such process has been observed in these locations. The lack of abiotic nitrogen cycling in even the most extreme of environments reaffirms our dependency on microbes (Canfield 2010).
Microbial life also indirectly contributes to human life by sustaining life in other organisms. This is especially relevant in food sources such as seafood, legumes and ruminant mammals. The microbial loop sustains the marine food web by mineralizing dissolved nutrients, including dissolved organic carbon. These nutrients have large scale ecological consequences, as they are ultimately incorporated into seafood. Without microbes, the dissolved nutrients would sink to the bottom of the ocean and the energy of these substances would be inaccessible to larger marine life.
Rhizobacteria is a nitrogen fixing bacteria that exists in symbiosis with legumes. It results in increased nitrogen fixation, serving as the main source of nitrogen for plants. Rhizobacteria has been identified in arid environments, suggesting its significance in maintaining sustainable crop yields in times of recurring climate instability and decreased availability of arable land (Ahran 1999). Therefore, humans are likely to become increasingly reliant on microbes for food production as the climate continues to change.
Ruminants are mammals who acquire nutrients by digesting plant based materials in specialized chambers of the stomach. These chambers contain cellulose digesting bacterial symbionts. Ruminants, including cattle and goats, are farmed for meat and animal products in food production. In the absence of symbiotic microorganisms, these mammals would be severely limited in their capacity to digest food. This would decrease the efficiency of food production significantly. One could imagine a humanity without either seafood, or legumes, or ruminant mammal products. However, the loss of microbial life would result in the simultaneous loss of not just these three food sources, but many others as well. It is simply unreasonable to assume that humans will have the capacity to engineer food products without one of the most fundamental components.
It is predicted that humanity is driving the Earth into a period of instability. Rockstrom describes the stability of the Earth in terms of “planetary boundaries” which, if crossed, could shift Earth subsystems into a new state. Human actions are the most significant factor in global environmental change at present. However, this change is an unintentional consequence of industrial and agricultural practices. Before the rise of civilization, changes in atmospheric reactions and geological processes occurred naturally over long periods of time in a regulated state. The emergence and development of industrialization, coupled with the expansion of agricultural practices in recent years has disrupted natural cycles. Global warming from greenhouse gas emissions and eutrophication due to nitrogen discharge from farming land are challenging planetary stability. Humans will be reliant on the adaptability of microbiomes to re-establish equilibrium. Microbes have survived since the dawn of life due to their resilience and adaptability. Their high replication rate and abundance enables them to respond to change rapidly and on an immense scale. This change occurs faster and on larger scales than human technology. Simply consider how rapidly antibiotic resistance arises in respond to antibiotics. It is for this reason that microbes must be present in order for life to continue on earth (Rockstrom 2009).
The quantity of microbial life on earth unfathomably vast. For example There is an estimated 1030 prokaryotic cells on earth, and around 13x1028 bacterial cells in the ocean (Whiteman 1998). Each of these cells is continuously metabolizing, contributing to the biogeochemistry of earth. Replacing microbial functions would have to scale accordingly. It would not be energetically possible to replace even just a sample of essential reactions described in this paper. It is in human nature to perceive the world from an anthropocentric perspective. Humans have managed to quantify the drastic alterations we have instigated in the environment. This has been used as evidence to the power and intelligence of humanity. It has led us to believe that we have the ability to manipulate all aspects of Earth’s environment when, in fact, our excessive lifestyle has repercussions beyond our control. The reality is that the existence of humanity has been fundamentally reliant on microbial life, since the dawn of civilization.
Canfield DE et al. 2010. The evolution and future of earth’s nitrogen cycle. Science. 330:192. doi: 10.1126/science.1186120.
Kasting JF, and Siefert JL. 2002. Life and the Evolution of Earth’s Atmosphere. Science. 296(5570):1066-1068. doi: 10.1126/science.1071184.
Nisbet EG, Sleep NH. 2001. The habitat and nature of early life. Nature. 409:1083-1091. doi:10.1038/35059210.
Rockstrom et al. 2009. A safe operating space for humanity. Nature. 461:472-475. doi:10.1038/461472a.
Whitman WB, Coleman DC, and Wiebe WJ. 1998. Prokaryotes: The unseen majority. Proc Natl Acad Sci USA. 95(12):6578-6583. PMC33863.
Falkowski PG, Fenchel T, Delong EF. 2008. The Microbial Engines That Drive Earth’s Biogeochemical Cycles. Science. 320(5879):1034-1039. Science1153213
Kasting JF, and Siefert JL. 2002. Life and the Evolution of Earth’s Atmosphere. Science. 296(5570):1066-1068. Science1071184
Rockstrom et al. 2009. A safe operating space for humanity. Nature. 461:472-475. Nature461472
Whitman WB, Coleman DC, and Wiebe WJ. 1998. Prokaryotes: The unseen majority. Proc Natl Acad Sci USA. 95(12):6578–6583. PMC33863
Discuss the relationship between microbial community structure and metabolic diversity.
Evaluate common methods for studying the diversity of microbial communities.
Recognize basic design elements in metagenomic workflows.
What were the main questions being asked?
To study PR photosystem structure and function using DNA libraries from marine picoplankton. Especially determine whether light driven ATP synthesis could be transferred to heterologous hosts in a single recombination event.
Summarize the main results or findings.
Two fosmids were identified that contained genes that are necessary and sufficient for proteorhodopsin based phototrophy. These were cloned into E. coli cells. Exterior pH and interior ATP concentration changed when the modified E. coli cells were exposed to light. These fosmids also contained genes sufficient to produce retinol so long as the cells already produced FPP (e.g. E. coli). Copy number of the genes showed a difference in phenotypic identification.
Specific emphasis should be placed on the process used to find the answer. Be as comprehensive as possible e.g. provide URLs for web sources, literature citations, etc.
How many prokaryotic divisions have been described and how many have no cultured representatives (microbial dark matter)?
The total number of groups we use to organize life is constantly increasing. The information we have about groups comes from sequencing data. The majority of life is uncultivated.
How many metagenome sequencing projects are currently available in the public domain and what types of environments are they sourced from?
There are thousands of metagenome sequencing projects, and the numbers are always changing. It should also be noted that the public domain constitutes only a small fraction of the projects that are currently taking place, as the majority of the projects are not available on public databases. These projects are sequences from all types of environments including sediments, soil, gut, and aquatic environments. They are especially useful in circumstances where it is hard to culture communities in lab settings.
What types of on-line resources are available for warehousing and/or analyzing environmental sequence information (provide names, URLS and applications)?
SHOTGUN METAGENOMICS: Assembly - Euler; Binning - S-GCOM; Annotation - KEGG; Analysis pipeline - Megan 5; Databases - IMG, MG-RAGT, NCB
MARKER GENE METAGENOMICS: Stand alone software - OUTbase; Analysis pipeline - SILVA; Denoising - Ampliconnoise; Database - Ribosomal Database Project
What is the difference between phylogenetic and functional gene anchors and how can they be used in metagenome analysis?
Phylogenetic gene anchors are used to identify phylogeny. They are used with vertical transfer, universal, single copy genes. They are used to create taxonomic trees by comparing the minute differences that develop over time. On the other hand, functional gene anchors are used to identify functions. They don’t often carry phylogenetic information. For example, methanogenesis is very specific to methanogens. So if you identify an enzyme for methanogenesis, you will only get information about a number of organisms. In addition, they tend to be horizontally transferred, so are not as useful in investigating taxonomy.
What is metagenomic sequence binning? What types of algorithmic approaches are used to produce sequence bins? What are some risks and opportunities associated with using sequence bins for metabolic reconstruction of uncultivated microorganisms?
Metagenomic sequence binning is a process of grouping sequence reads into operational taxonomic units. The algorithmic approaches used are to align sequences to databases or group sequences to each other based on DNA characteristics (for example: GC content or codon usage). Some risks associated with this technique are incomplete coverage of genome sequence, or contamination from different phylogeny. For this reason, metagenomic sequence binning is a significant controversy in the field.
Is there an alternative to metagenomic shotgun sequencing that can be used to access the metabolic potential of uncultivated microorganisms? What are some risks and opportunities associated with this alternative?
FISH is an alternative technique to metagenomic shotgun sequencing. It’s fairly accurate and precise. However, it is quite time consuming and expensive. The signal is also short lived. Some other techniques are functional screens (biochemical assays) and 3rd generation and single cell sequencing.
Martinez et al. 2007. Proteorhodopsin photosystem gene expression enables photophosphorylation in a heterologous host. Proc Natl Acad Sci USA. 104(13):5590-5595. PMC1838496
Wooley JC, Godzik A, Friedberg I. 2010. A Primer on Metagenomics. Proc Natl Acad Sci USA. 6(2). PMC2829047
Evaluate the concept of microbial species based on environmental surveys and cultivation studies.
Explain the relationship between microdiversity, genomic diversity, and metabolic potential.
Comment on the forces mediating divergence and cohesion in natural microbial communities.
What were the main questions being asked?
Define difference between uropathogenic and enterohemorragic strains of E. coli by comparing genome sequences. Specifically compare CFT073 to EDL933 and MG1655.
What were the primary methodological approaches used?
Random clones were sequenced using a dye. Use PCR based techniques to analyze data. Data was collected using automated sequencers and sequences assembled with SEQMANII. Genome was annotated with MAGPIE (uses GLIMER to find ORFs) and proteins annotated with BLAST search.
Summarize the main results or findings.
The common core E. coli chromosome is preserved through vertical transfer. Island locations often at same position in backbone in CFT073 and EDL933 strains. This results in ability to infect the urinary tract and bloodstream. Differences in enterohemorhaggic and uropathogenic E. coli are reflected in CFT073 lack of type III secretion system and certain phage/plasmid virulence genes. In conclusion, it’s not enough to define species by some phenotypic traits and low resolution mapping; there’s much more to be studied and considered
Do new questions arise from the results?
The authors of the article didn’t explicitly mention questions arising from the results. However, they do point out the vast quantity of unique DNA in E. coli strains, perhaps suggesting these need to be defined further. In addition, the authors do present a number of propositions throughout the article that should be confirmed.
Were there any specific challenges or advantages in understanding the paper (e.g. did the authors provide sufficient background information to understand experimental logic, were methods explained adequately, were any specific assumptions made, were conclusions justified based on the evidence, were the figures or tables useful and easy to understand)?
The paper was well formatted with a clear abstract, materials and methods, results, discussion portions. Background information provided was succinct and relevant to the rest of the paper. It did a good job at explaining why their research was relevant and worthy. Although the background information section did not justify the methods or provide information that would support the results they drew from their data. The authors made some assumptions, but pointed them out (ex. criteria for inferring orthology of CFT073 genome). Like the rest of the paper, the number of table and images provided was sufficient, and effective in supporting major points of the paper.
Comment on the creative tension between gene loss, duplication and acquisition as it relates to microbial genome evolution
Identify common molecular signatures used to infer genomic identity and cohesion
Differentiate between mobile elements and different modes of gene transfer
Based on your reading and discussion notes, explain the meaning and content of the following figure derived from the comparative genomic analysis of three E. coli genomes by Welch et al. Remember that CFT073 is a uropathogenic strain and that EDL933 is an enterohemorrhagic strain. Explain how this study relates to your understanding of ecotype diversity. Provide a definition of ecotype in the context of the human body. Explain why certain subsets of genes in CFT073 provide adaptive traits under your ecological model and speculate on their mode of vertical descent or gene transfer.
The figure is describing gene islands, which are used as evidence of horizontal gene transfer. From the image, we can see there are many shared locations of gene islands, whose size can vary up to 80bp. We would anticipate that islands of the same size and location are nearly identical, and it is the varying islands that define differences between uropathogenic and enterohemorrhagic strains.
Ecotype diversity describes a distinct form or race of an organism in a particular habitat. For example, within the body ecotype diversity can be manifested as the distinct microbes that occupy characteristic parts of our bodies. Therefore, ecotype diversity can be caused by differences in gene islands occurring as a result of HGT. Based on this, the CFT073 adaptive traits a result of the presence of asnT,V, metV or (pap)pheU, the absence of serW, or the difference between the sizes of many of the islands, (pap-hly)pheV and pheV, or selC and selC(LEE). These adaptive traits are likely acquired by HTG.
Many challenges exist in defining microbial species, mainly a lack of consensus in the academic community, the constant evolution of microbes, and the constant evolution of technology. This is why many schools of thought exist regarding Species Concepts and species definitions. Species definition began with Mayr’s Biological Species Concept. As the prominence of microbial life became evident, microbiologists began to favor the new Phylogenetic Species Concept. However, even this system cannot adequately account for the rapid evolution of microbes by horizontal gene transfer (HGT). The ideal species classification system would integrate all fields of consideration, including genetics, ecology, and evolution.
Mayr’s Biological Species Concept is the most widely accepted species concept. It describes species as being organisms which can interbreed. This definition cannot be applied to microbes, as they reproduce asexually. Instead, over the past fifty years, microbial species have been defined based on the level of pairwise DNA-DNA hybridization (DDH) between the DNA of respective genomes. This method entails labelling one strand of DNA, mixing it with an unlabelled DNA sample, then observing the quantity of hybrids that arise. It is expected that sequences with greater similarity will bind at a higher frequency. A 70% DDH value is considered the universal standard within which a species would exist (Konstantinidis 2006). However, many argue that this standard is too liberal and not an accurate representation of organism similarity. In fact, the cut off for species definition was never scientifically justified. It was chosen to match existing species definitions at that time. In the time since this method was developed, genome sequencing has become a much more viable option. The average nucleotide identity (ANI) method measures the extent of nucleotide similarity between coding regions of genomes. A 95% ANI corresponds to the 70% DDH criteria. A single bacterial species may be as diverse as an entire vertebrate order and that the diversity of prokaryotic cells could be largely underestimated in comparison to eukaryotes (Doolittle 2009). This is a clear indication that our current system of prokaryotic species classification is lacking in rigour. This also indicates the importance in designing a new system in concurrence with other proven methods of classification.
The Phylogenetic Species Concept is based on comparisons of fixed morphological differences between organisms. It is more suitable for microbial species definition than the Biological Species Concept, since it is less restrictive in terms of reproduction, so can be applied to asexual organisms. Microbial phylogenetic trees are constructed using 16S rRNA sequencing data since it is highly conserved and ubiquitous across microbes. In comparison to more recent sequencing technologies, this method is unreliable as it yields poor genetic resolution. For example, Shigella can yield nearly identical 16S rRNA sequences to those of Escherichia coli by the ANI comparison method (Doolittle 2009). However, phylogenetic trees are useful since they can be used to study the divergence of species across time.
One of the most significant obstacles faced in microbial species definition is HGT. The sheer abundance and rapid replication rate of microbes yields a high frequency of HGT. This is an evolutionary necessity, as frequent mutations increase the diversity and, consequently, the adaptability of populations. This also means that HGT can increase genetic variation in species and distribute phylogenetic pathways so they are no longer characteristic. Consider one study on Streptococcus algalactiae which revealed that each genome acquired by HGT would result in at least thirty new genes in the pan genome. Another study showed that 8-21% of genes in E.coli strains are strain specific (Achtman 2008). These genes, acquired by HGT, are often found in association with genomic islands. This clearly complicates sequence based phylogenetic tree assembly, as a large quantity of genetic information is rapidly exchanged between microbes existing in proximity. Even if an effective, universal microbial species concept is defined, the species themselves will constantly be evolving due to HGT. The enormity of the task of tracking such changes could be mitigated by focusing on known gene islands and tracking the changes in these regions over time.
Ideally, the future will yield technology where one does not have to make such compromises, and could efficiently sequence whole genomes. This would clearly provide the most comprehensive view of microbial species evolution. Its more than likely that HGT will impact regions outside of the notable gene islands. Whole genome sequencing would also be useful in incorporating ecotype definitions into our microbial species concept, since ecologically relevant genes would also be considered (Konstantinidis 2006).
It is necessary to have a universal definition for a microbial species. This allows for consolidation of knowledge and effective information sharing, ultimately promoting further advances in science and technology. The species themselves and the technologies we use to define them are always sharing. It is important that this information be clearly communicated across all parties involved in this type of research, so as to facilitate further development in the field.
Achtman M, Wagner M. 2008. Microbial diversity and the genetic nature of microbial species. Nature Reviews Microbiology. 6:431-440. doi: 1038/nrmicro1872.
Doolittle WF, Zhaxybayeva O. 2009. On the origin of prokaryotic species. 19:744-756. doi:10.1101/gr.086645.108.
Konstantinidis et al. 2006. The bacterial species definition in the genomic era. Proc Natl Acad Sci USA.361(1475):1929-1940. doi:10.1098/rstb.2006.1920.
Haya Abuzuluf
Jack Anthony
Judy Ban
Ryan Nah
Ryan Lou
Sawera Dhaliwal
Water samples from various depths of Saanich Inlet, a model ecosystem for the effects of growing oxygen minimum zones in the open ocean, were analyzed via 16S iTag amplicon sequencing and processed using both mothur and QIIME2 independently. The correlation between changes in abundance of Sulfurimonas, a genus encompassing several species of sulfur-oxidizing Epsilonproteobacteria, with oxygen, sulfide, and nitrate concentrations was determined by use of a linear regression model in R. Statistically significant associations with likely biological relevance were discovered using mothur but not for QIIME2. Statistically significant correlations between individual OTUs and ASVs with various nutrient concentrations were similarly discovered, with more identified by mothur than QIIME2. Sequence processing with mothur and QIIME2 arrived at very different conclusions, suggesting that the two fundamentally different analysis philosophies lead to very different results. However, the statistical and methodological weaknesses of this study do not enable strong claims for or against the use of either analysis pipeline. We discuss these downfalls and possible directions for future work in this area to build on our results.
Saanich Inlet is a seasonally anoxic fjord located off the southeastern coast of Vancouver Island, British Columbia [1]. During fall and winter, strong winds mix water from the Strait of Georgia into the inlet and replace bottom water [2]. As the weather gets calmer in spring, deep water in the inlet is retained by a shallow sill near the inlet mouth and increasing stratification. High levels of primary productivity export organic matter from the euphotic zone leading to oxygen loss with depth, caused by microbial remineralization of said organic matter. This organic matter fueled respiration is sufficient to create hypoxic conditions below 100m and anoxic conditions below 150m [1]. Saanich Inlet’s predictable, recurring anoxia presents an excellent model exosystem for the study of processes occurring in other anoxic marine environments all around the world.
Organic carbon is used by heterotrophic bacteria as an energy source in aerobic respiration. As oxygen availability drops, methane, ammonia, and hydrogen sulfide build up as the products of anoxic metabolism [1]. Globally, this process has the effect of creating bands of hypoxic water between 100m and 1000m in the open ocean. These oxygen minimum zones (OMZs) are expanding in all oceans and are expected to continue expansion as a consequence of anthropogenic climate change [3]. In Saanich Inlet, the seasonal flushing of new water into the deep removes the buildup of metabolites and refreshes the supply of oxygen each winter. The predictable and recurring nature of the physical and chemical cycling of Saanich Inlet makes it an ideal model ecosystem to study the biological processes that occur in oxygen minimum zones (OMZs) globally. The species present in a particular water sample are largely influenced by the availability of specific terminal electron acceptors (TEAs). Saanich Inlet offers an annually reset gradient of changing TEAs with depth, making it an excellent system to study the change of community structure in relation to depth or oxygen.
Operational taxonomic units (OTUs) are a representation of taxonomic grouping used for both metagenomic and 16S amplicon sequencing data, where genetic sequences which fall within some limit of similarity are clustered together into an OTU [4]. While this serves to overcome the problem of sequencing error causing the true sequence to be masked by grouping of the similar sequences, if the similarity cutoff is not sufficiently stringent, multiple species may be clustered together. Similarly, if it is too strict, a single species may be split up into multiple OTUs. Additionally, as OTUs represent a cluster of a number of distinct sequences generated for a specific data set, OTU abundances are not comparable across studies and datasets. Despite these issues, for the purposes of microbial community analysis, OTUs generated at the 3% sequence similarity cutoff have long been the de facto proxy for species identity.
Recently, an alternative to OTUs has been presented; namely, the amplicon sequence variant, or ASV [5]. ASVs are determined under the idea that the true sequence would appear more commonly than would an erroneous sequence, and thus only the sequences thought to be real sequences in this way are retained, disregarding all erroneous sequences. While this leads to discarding large amounts of sequence information, the resulting true sequences are comparable between data sets as a specific genetic sequence is associated with any given ASV. Though ASVs represent an actual sequence independent of a given dataset which is arguably superior to that of an OTU definition, whether their use confers advantages over OTUs has not been conclusively determined presently.
Sulfurimonas was selected as our taxon of interest due to its unusual metabolic pathways and the effects said pathways have on the environment. This bacterium can have either spiral or curved rod shaped cells and has one or two flagella for motility. These chemolithoautotrophs reduce nitrate and nitrite using sulfur compounds or H2 as electron sources [6]. Such a coupling of nitrogen and sulfur cycles in a single bacterium suggests a fascinating interplay between abundance and TEA availability. Sulfurimonas would be expected to increase in abundance with nitrate and oxidized sulfur concentrations, as well as decrease in abundance in the presence of oxygen, as it may be outcompeted by more organisms that find the surface environmental reduction potential more favourable. However, interactions with other species in the anoxic region, combined with limiting nitrate gradients near the bottom may cause the abundance to peak at intermediate depths, before declining in the deepest part of the water column.
The environmental samples used in this project were collected through time-series monitoring in Saanich Inlet on a monthly-basis aboard the MSV John Strickland at station S3 (48°35.500 N, 123°30.300 W) [1]. Samples for large volume (LV) SSU rRNA gene tags, metagenomics, metatranscriptomics, and metaproteomics were taken from six major depths spanning the oxycline (10, 100, 120, 135, 150, 165, and 200m). Large volume waters were collected in 2x12 1 Go-Flow bottles on a wire and gathered into 2L Nalgene bottles with sterile silicon tubing immediately following sampling for dissolved gases to minimise changes in microbial gene expression. A 0.22 μm Sterivex filter was used to collect biomass from collected water samples. Genomic DNA was then extracted from these filters and used to generate small subunit ribosomal RNA (SSU or 16S/18S rRNA) gene pyrotag libraries. PCR amplification targeting the V4-V5 region of SSU rRNA gene was performed to generate iTag datasets or amplicons. Samples were sequenced according to the standard operating protocol on an Illumina MiSeq platform at the JGI with 2x300bp technology. Using as consistent parameters, sequences were processed through both mothur [7] and QIIME2 [8]. Two phyloseq objects resulted from the processing which were then used in subsequent analyses.
Analysis was completed in R v3.4.3 [9] using the following packages.
library("tidyverse")
## ── Attaching packages ──────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 2.2.1 ✔ purrr 0.2.4
## ✔ tibble 1.4.2 ✔ dplyr 0.7.4
## ✔ tidyr 0.8.0 ✔ stringr 1.3.0
## ✔ readr 1.1.1 ✔ forcats 0.3.0
## ── Conflicts ─────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library("phyloseq")
library("vegan")
## Warning: package 'vegan' was built under R version 3.4.4
## Loading required package: permute
## Loading required package: lattice
## This is vegan 2.5-1
library("corrplot")
## corrplot 0.84 loaded
Sample data from mothur and QIIME2 were organized and normalized using data.frame() function to make tabular data, where cases were represented in rows and measurements in columns. Relative abundances of our taxa of interest were compared across nutrient gradients using a linear regression model. Linear models describe a continuous response variable as a function of one or more predictor variable [10]. They were used in our analyses since each of the nutrient concentrations is a continuous variable with order.
Figure 1. Nutrient concentration plotted with depth
Overall microbial community structure at the phylum as determined by mothur in terms of taxonomic breakdown changes slightly from 10m to 100m, then is relatively consistent throughout all the lower depths (Fig. 2). Most depths are dominated by a large population of Proteobacteria and Bacteroidetes, the relative population of the latter giving way to the Proteobacteria with depth. Similar observations are seen in the QIIME2-processed data (Fig. 3).
Figure 2. Relative abundance of phyla present in Saanich Inlet based on sample depth as determined by mothur.
Figure 3. Relative abundance of phyla present in Saanich Inlet based on sample depth as determined by QIIME2.
In the mothur-processed data, Chao1 richness indicates that the number of identified OTUs increases until a peak at 100m, then decreasing to the deepest parts of the inlet, starting at 120m with a small increase at 200m (Fig. 4). Species diversity, as measured by the Inverse Simpson index, also peaks at 100m then decreases with increased depth. As oxygen concentrations decreases with depth in Saanich Inlet (Fig. 1), a higher oxygen concentration correlates with both increased richness and diversity in the sample.
## Warning in (function (..., row.names = NULL, check.rows = FALSE,
## check.names = TRUE, : row names were found from a short variable and have
## been discarded
## Warning in (function (..., row.names = NULL, check.rows = FALSE,
## check.names = TRUE, : row names were found from a short variable and have
## been discarded
## Warning: Removed 7 rows containing missing values (geom_errorbar).
Figure 4. Alpha Diversity Indicators - Chao1 and Inverse Simpson Index plotted against depth and coloured to compare with [O2] from mothur-processed data.
As for the QIIME2-processed data, Chao1 richness peaks at 120m where as before it decreases with increasing depth, increasing slightly at 200m (Fig. 5). Diversity, on the other hand, peaks at 10m, which then decreases monotonically with depth. Diversity and richness appear less strongly correlated in the QIIME2 dataset. However, the overall trends in richness and diversity of Saanich Inlet are observable regardless of analysis pipeline.
## Warning in (function (..., row.names = NULL, check.rows = FALSE,
## check.names = TRUE, : row names were found from a short variable and have
## been discarded
## Warning in (function (..., row.names = NULL, check.rows = FALSE,
## check.names = TRUE, : row names were found from a short variable and have
## been discarded
## Warning: Removed 7 rows containing missing values (geom_errorbar).
Figure 5. Alpha Diversity Indicators - Chao1 and Inverse Simpson Index plotted against depth and coloured to compare with [O2] from QIIME2-processed data.
In the mothur-processed data, correlation matrix analysis indicates that Sulfurimonas abundance appears to increase with depth, decrease with oxygen and nitrate concentration, as well as increase with sulfide concentration (Fig. 6). Indeed, linear regression of nutrient concentration and Sulfurimonas relative abundance indicate that the concentrations of each of the 3 nutrients significantly relate to Sulfurimonas relative abundance (Fig. 8; p < 0.01 after FDR-correction; Table 1). This overall pattern is observable in the QIIME2-processed data as well, while the correlations differ in strength (Fig. 7) and are not statistically significant when multiple linear regression is performed (Fig. 9; p > 0.05 after FDR-correction; Table 2).
Figure 6. Correlation matrix of nutrient concentration and Sulfurimonas abundance as determined with mothur.
Figure 7. Correlation matrix of nutrient concentration and Sulfurimonas abundance as determined with QIIME2.
| Estimate | Std..Error | t.value | p.value | FDR-corrected | |
|---|---|---|---|---|---|
| (Intercept) | 0.0224803 | 0.0022788 | 9.865069 | 0.0022148 | 0.0044296 |
| O2_uM | -0.0000980 | 0.0000156 | -6.286284 | 0.0081297 | 0.0081297 |
| H2S_uM | 0.0023563 | 0.0002058 | 11.449147 | 0.0014300 | 0.0044296 |
| NO3_uM | -0.0007633 | 0.0001212 | -6.300053 | 0.0080795 | 0.0081297 |
| Estimate | Std..Error | t.value | p.value | FDR-corrected | |
|---|---|---|---|---|---|
| (Intercept) | 0.0809162 | 0.0241215 | 3.354527 | 0.0439111 | 0.1756445 |
| O2_uM | -0.0003616 | 0.0001650 | -2.192073 | 0.1160289 | 0.2320579 |
| H2S_uM | 0.0029215 | 0.0021785 | 1.341047 | 0.2723990 | 0.2723990 |
| NO3_uM | -0.0020102 | 0.0012825 | -1.567472 | 0.2149981 | 0.2723990 |
Figure 8. Relative abundance of Sulfurimonas plotted against nutrient concentration from the mothur-processed dataset.
Figure 9. Relative abundance of Sulfurimonas plotted against nutrient concentration from the QIIME2-processed dataset.
Mothur identified eight OTUs classified as Sulfurimonas, all of which were only identified in the samples at depths below 100m (Fig. 10). As at the taxon level, the abundances of most, of these OTUs increase with depth, and all but one were present at the deepest depth. Similarly, QIIME2 identified seven Sulfurimonas ASVs (Fig. 11), meaning that the richness determined by the two analysis pipelines is roughly comparable. However, fewer ASVs were shown to increase in abundance with depth, and just under half were identified in the 200m sample.
Figure 10. Relative abundances of each Sulfurimonas OTU as determined by mothur, recolored by oxygen concentration at the given depth of the sample.
Figure 11. Relative abundances of each Sulfurimonas ASV as determined by QIIME2, recolored by oxygen concentration at the given depth of the sample.
Multiple linear regression of the mothur-processed data indicated that the abundances of 7 of the 8 identified OTUs significantly correlated following false discovery rate correction with one or more of the tested nutrients (NO3-, H2S, and O2), most commonly sulfide (Table 3). A representative OTU is shown in Figure 12, OTU0308, which showed a significant correlation with all 3 tested nutrient variables. In comparison, QIIME2-processed data showed no significant correlations with either oxygen nor nitrate concentration, and the abundances of only 3 of the 7 identified ASVs significantly correlated with sulfide concentration (Table 4). A representative ASV is shown in Figure 13, ASV1250, which showed a correlation of abundance with sulfide concentration.
| OTU | Variable | Estimate | Std_Error | t_value | p_value | FDR_corrected |
|---|---|---|---|---|---|---|
| Otu0308 | O2 | -0.0000601 | 0.0000080 | -7.4950313 | 0.0049204 | 0.0168698 |
| Otu0308 | H2S | 0.0013338 | 0.0001058 | 12.6015568 | 0.0010776 | 0.0113253 |
| Otu0308 | NO3 | -0.0004953 | 0.0000623 | -7.9496787 | 0.0041516 | 0.0166066 |
| Otu0666 | O2 | -0.0000203 | 0.0000046 | -4.4026333 | 0.0217284 | 0.0474074 |
| Otu0666 | H2S | 0.0000584 | 0.0000609 | 0.9589865 | 0.4083113 | 0.4666414 |
| Otu0666 | NO3 | -0.0001611 | 0.0000358 | -4.4974784 | 0.0205213 | 0.0474074 |
| Otu0704 | O2 | 0.0000061 | 0.0000063 | 0.9602731 | 0.4077575 | 0.4666414 |
| Otu0704 | H2S | 0.0008052 | 0.0000834 | 9.6541059 | 0.0023594 | 0.0113253 |
| Otu0704 | NO3 | 0.0000623 | 0.0000491 | 1.2682777 | 0.2941783 | 0.4412674 |
| Otu0751 | O2 | -0.0000269 | 0.0000055 | -4.8516438 | 0.0167137 | 0.0445698 |
| Otu0751 | H2S | -0.0002618 | 0.0000731 | -3.5797332 | 0.0372935 | 0.0745869 |
| Otu0751 | NO3 | -0.0002206 | 0.0000430 | -5.1242045 | 0.0143892 | 0.0431675 |
| Otu1315 | O2 | 0.0000022 | 0.0000023 | 0.9602731 | 0.4077575 | 0.4666414 |
| Otu1315 | H2S | 0.0002899 | 0.0000300 | 9.6541059 | 0.0023594 | 0.0113253 |
| Otu1315 | NO3 | 0.0000224 | 0.0000177 | 1.2682777 | 0.2941783 | 0.4412674 |
| Otu2793 | O2 | 0.0000005 | 0.0000005 | 0.9602731 | 0.4077575 | 0.4666414 |
| Otu2793 | H2S | 0.0000644 | 0.0000067 | 9.6541059 | 0.0023594 | 0.0113253 |
| Otu2793 | NO3 | 0.0000050 | 0.0000039 | 1.2682777 | 0.2941783 | 0.4412674 |
| Otu3512 | O2 | 0.0000000 | 0.0000033 | 0.0119645 | 0.9912051 | 0.9912051 |
| Otu3512 | H2S | 0.0000020 | 0.0000439 | 0.0449804 | 0.9669496 | 0.9912051 |
| Otu3512 | NO3 | 0.0000191 | 0.0000258 | 0.7396834 | 0.5131159 | 0.5597628 |
| Otu3610 | O2 | 0.0000005 | 0.0000005 | 0.9602731 | 0.4077575 | 0.4666414 |
| Otu3610 | H2S | 0.0000644 | 0.0000067 | 9.6541059 | 0.0023594 | 0.0113253 |
| Otu3610 | NO3 | 0.0000050 | 0.0000039 | 1.2682777 | 0.2941783 | 0.4412674 |
Figure 12. Relative abundance of OTU0308 versus nutrient concentration, points recolored by depth.
| ASV | Variable | Estimate | Std_Error | t_value | p_value | FDR_corrected |
|---|---|---|---|---|---|---|
| Asv277 | O2 | -0.0003010 | 0.0001398 | -2.1533547 | 0.1203267 | 0.5053720 |
| Asv277 | H2S | -0.0040259 | 0.0018458 | -2.1810766 | 0.1172306 | 0.5053720 |
| Asv277 | NO3 | -0.0019546 | 0.0010866 | -1.7987740 | 0.1698884 | 0.5351818 |
| Asv561 | O2 | 0.0000006 | 0.0000519 | 0.0119645 | 0.9912051 | 0.9912051 |
| Asv561 | H2S | 0.0000308 | 0.0006857 | 0.0449804 | 0.9669496 | 0.9912051 |
| Asv561 | NO3 | 0.0002986 | 0.0004037 | 0.7396834 | 0.5131159 | 0.5671281 |
| Asv578 | O2 | 0.0000150 | 0.0000156 | 0.9602731 | 0.4077575 | 0.5351818 |
| Asv578 | H2S | 0.0019946 | 0.0002066 | 9.6541059 | 0.0023594 | 0.0165161 |
| Asv578 | NO3 | 0.0001543 | 0.0001216 | 1.2682777 | 0.2941783 | 0.5351818 |
| Asv1153 | O2 | 0.0000333 | 0.0000347 | 0.9602731 | 0.4077575 | 0.5351818 |
| Asv1153 | H2S | 0.0044271 | 0.0004586 | 9.6541059 | 0.0023594 | 0.0165161 |
| Asv1153 | NO3 | 0.0003424 | 0.0002700 | 1.2682777 | 0.2941783 | 0.5351818 |
| Asv1250 | O2 | 0.0000154 | 0.0000160 | 0.9602731 | 0.4077575 | 0.5351818 |
| Asv1250 | H2S | 0.0020433 | 0.0002116 | 9.6541059 | 0.0023594 | 0.0165161 |
| Asv1250 | NO3 | 0.0001580 | 0.0001246 | 1.2682777 | 0.2941783 | 0.5351818 |
| Asv1620 | O2 | -0.0000548 | 0.0000487 | -1.1252336 | 0.3423877 | 0.5351818 |
| Asv1620 | H2S | -0.0007710 | 0.0006433 | -1.1986359 | 0.3167203 | 0.5351818 |
| Asv1620 | NO3 | -0.0002880 | 0.0003787 | -0.7606489 | 0.5021883 | 0.5671281 |
| Asv2216 | O2 | -0.0000702 | 0.0000731 | -0.9602731 | 0.4077575 | 0.5351818 |
| Asv2216 | H2S | -0.0007774 | 0.0009655 | -0.8051467 | 0.4796349 | 0.5671281 |
| Asv2216 | NO3 | -0.0007209 | 0.0005684 | -1.2682777 | 0.2941783 | 0.5351818 |
Figure 13. Relative abundance of ASV1250 versus nutrient concentration, points recolored by depth.
Thus, while the overall trends in abundance across depth, [H2S], [NO3-], and [O2] appear to be similar for mothur- and QIIME2-processed data, many more relationships were found to be statistically significant in the mothur-processed dataset, both at the relative genus level abundances as well as at the OTU/ASV level. The abundances appear to match the fact that the Sulfurimonas genus tends to inhabit anoxic/sulfidogenic regions of marine basins, including oxic-anoxic interfaces and hydrothermal vents [13].
Within the Sulfurimonas genus the richness determined by mothur and QIIME2 processed data was highly comparable. Both mothur and QIIME2 pipeline processed data appeared to reveal the same overall trends of community structure change with respect to nutrient concentration, though the proportion of differences which were statistically significant was far lower in the QIIME2 data. In general, diversity relative to richness was found to rise to a peak, after which it begins to decrease with increasing depth (Figures 4-5). Sulfurimonas abundance increases with increasing depth and sulfide concentration, and decreases with increasing oxygen and nitrate concentration (Figures 6-9).
Most of the trends observed in this study were consistent with our hypotheses. As a chemolithoautotroph, it is expected that Sulfurimonas be most abundant in deep, anoxic waters. Sulfurimonas metabolizes by reducing nitrate and nitrite and oxidizing sulfur containing compounds or hydrogen [5]. Thus, as hypothesized, Sulfurimonas relative abundance was significantly decreased in the presence of oxygen, where it will likely be outcompeted by organisms who can use oxygen as an electron donor. In the same way, Sulfurimonas relative abundance was significantly increased in the presence of sulfide. However, it was unexpected that Sulfurimonas abundance was significantly negatively associated with nitrate concentration. It may be that organisms which utilize nitrate more efficiently are outcompeting Sulfurimonas as at the depths where nitrate is high, the concentration of sulfide is low (Fig. 1). Potentially this association may be caused by other confounding environmental factors that were studied in this analysis such as temperature or microbial metabolites produced by specific microbes adapted to those particular anoxic regions of the water column. Regardless, this observation was contrary to what was expected, and no single explanation can be concluded based on the available data. Evidently, however, the abundance of Sulfurimonas in Saanich Inlet is likely to vary as a function of many nutrient concentrations, not just oxygen.
There appears to be a correlation between stratified layers of the water column and Sulfurimonas distribution. Samples used in this study were taken at a time of year when stratification due to a thermocline was present around 10m-100m. The temperature at 10m is substantially higher than lower depths. An increase in diversity was also observed from the surface to a peak at 10m. It could be hypothesized that the high diversity is due to ambient temperatures being ideal for bacterial growth with minimal limitation. In this study, it was also found that diversity tends to decrease with richness at lower depths, with minimums coinciding with the boundary of the thermocline. The low diversity at depth could also be explained by the high specificity required to thrive in such extreme environments. For instance, our taxon of interest, Sulfurimonas, is commonly found near deep sea hydrothermal vents and functions primarily by sulfide oxidation, a process which is favored by the high concentrations of sulfide in that habitat.
The overall trend of the results of this experiment align with our hypotheses and previous literature [6], and while the conducted statistical tests show that some of the correlations we expected to see are significant, a number of methodological issues make us less confident in the replicability of our results. The use of a linear regression model assumes that the relationships between the tested variables would be linear, which is highly unlikely to be true in a complex biological system. As an example, interspecies competition may make intermediary concentrations of a nutrient such as nitrate more beneficial, as at higher concentrations perhaps denitrifying bacteria would outcompete Sulfurimonas, and it is only at intermediate concentrations that it can find its niche. However, should the concentration be too low, Sulfurimonas would be unable to grow either. Thus, given that there will always be other confounding environmental variables not taken into account in the model, the apparent success of the model is more surprising than not. Future investigations into the subject should consider employing stronger statistical tests which more accurately measure the phenomenon being examined.
We are unable to draw any definitive conclusions regarding which pipeline used in this study is strictly superior. However, clear differences were observed in the results obtained from each, specifically in regards to the Sulfurimonas genus. Both suggested similar overall relationships between nutrient concentration and Sulfurimonas abundance. However, whether the identified relationships were statistically significant proved to be very different between the two analysis pipelines. Due to the aforementioned flaws in the statistical methodology used, we cannot confidently assert whether use of one pipeline is better than the other, but it is clear that the two produce different results, and choice of pipeline is extremely important for microbial ecology. It should be noted that had multiple-comparisons testing not been controlled for, far more test conditions would have yielded significance. This highlights the importance of ensuring statistical rigour of the experimental methodology, as well as the benefit of consulting a statistician during the experimental design process.
As each pipeline approaches the most important analytical step of sequences processing differently, each has its own guidelines and configuration profiles. Hence, choosing the correct pipeline with a set of parameters and algorithms for a given application is important. While for this study the parameters and databases were kept as similar as possible between the two, there are fundamental differences in the philosophy by which the two pipelines handle data. As mothur clusters individual sequences while QIIME2 groups by sample consistency, studies have shown that the effect of sequencing errors yielded a bigger impact on the results than choosing the appropriate gene region for amplification [11]. In addition, selecting a pipeline with higher sequence throughput could increase the chance of richness overestimation [11]. Because the analytical steps are paramount to microbial ecology research and discovery, variations in the quality of databases and their annotations could impact the validity of research results. Especially for clustering-first pipelines such as mothur and QIIME2, the choice of the reference database in terms of comprehensiveness and sensitivity has implications in the accuracy of microbial abundance estimation [12]. A standardized evaluation protocol may be beneficial to overcome the dilemma of pipeline selection. Regardless of the pipeline chosen, it is important going forward to ensure pipeline methodology is described completely, with every decision within the pipeline well-justified.
To build on the results of this study, future work analysing the role of each of the many environmental variables on abundance of Sulfurimonas by a stronger multivariate statistical test would be beneficial. Additionally, further analyses conducting processing with mothur and QIIME2 in parallel would enable stronger claims to be made regarding the relative ability of the two analysis pipelines, and more generally for OTU- and ASV-based analysis pipelines.
Torres-Beltrán M, Hawley AK, Capelle D, Zaikova E, Walsh DA, Mueller A, Finke J. 2017. A compendium of geochemical information from the Saanich Inlet water column. Sci Data. 4:170159. doi:10.1038/sdata.2017.159.
Ocean Networks Canada. 2013. Introduction to Saanich Inlet. Retrieved from http://www.oceannetworks.ca/introduction-saanich-inlet.
Stramma L, Schmidtko S, Levin LA, Johnson GC. 2010. Ocean oxygen minima expansions and their biological impacts. Deep Sea Res Part 1 Oceanogr Res Pap. 57(4):587-595. doi: 10.1016/j.dsr.2010.01.005
Wooley, J. C., Godzik, A., & Friedberg, I. (2010). A primer on metagenomics. PLoS Computational Biology, 6(2), e1000667. doi:10.1371/journal.pcbi.1000667
Callahan, B. J., Mcmurdie, P. J., & Holmes, S. P. (2017). Exact sequence variants should replace operational taxonomic units in marker-gene data analysis. The ISME Journal, 11(12), 2639. doi:10.1038/ismej.2017.119
Labrenz M, Grote J, Mammitzsch K, Boschker HTS, Laue M, Jost G, Glaubitz S, Jürgens K. 2013. Sulfurimonas gotlandica sp. nov., a chemoautotrophic and psychrotolerant epsilonproteobacterium isolated from a pelagic redoxcline, and an emended description of the genus Sulfurimonas. Int J Syst Evol Microbiol. 63(Pt 11):4141-4148. doi:10.1099/ijs.0.048827-0.
Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, Lesniewski RA, Oakley BB, Parks DH, Robinson CJ, Sahl JW, Stres B, Thallinger GG, Van Horn DJ, Weber CF. 2009. Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol 75:7537-7541. doi:10.1128/AEM.01541-09
Ortutay C, Ortutay Z. 2017. Introduction to R statistical environment, p 1-15. In Molecular Data Analysis, John Wiley & Sons, Hoboken, New Jersey, USA.
Bingham NH, Fry JM, & SpringerLink ebooks - Mathematics and Statistics. (2010). Regression: Linear models in statistics. London; New York: Springer.
Siegwald L, Touzet H, Lemoine Y, Hot D, Audebert C, Caboche S. 2017. Assessment of Common and Emerging Bioinformatics Pipelines for Targeted Metagenomics. PLoS One. 12(1):e0169563. doi:10.1371/journal.pone.0169563. Plummer E, Twin J, Bulach DM, Garland SM, Tabrizi SN. 2015. A Comparison of Three Bioinformatics Pipelines for the Analysis of Preterm Gut Microbiota using 16S rRNA Gene Sequencing Data. J Proteomics Bioinform. 8:283-291. Sievert SM, Scott KM, Klotz MG, Chain PSG, Hauser LJ, Hemp J, Hügler M, Land M, Lapidus A, Larimer FW, Lucas S, Malfatti SA, Meyer F, Paulsen IT, Ren Q, Simon K, the USF Genomics Class. 2008. Genome of the Epsilonproteobacterial Chemolithoautotroph Sulfurimonas denitrificans. Appl Environ Microbiol. 74(4):1145-1156. doi:10.1128/AEM.01844-07.
Welch et al. 2002. Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli. Proc Natl Acad Sci USA. 99(26):17020-17024. PNAS252529799
Sequences for nitrite reductase (nirS) gene were obtained from various depths of Saanich Inlet, a model ecosystem for the effects of growing oxygen minimum zones in the open ocean. Tree-based Sensitive and Accurate Protein Profiler (TreeSAPP) was implemented for automated reconstruction of the nitrogen cycle along defined redox gradients in Saanich Inlet for metagenomic and metatranscriptomic assemblies of nirS. Increased DNA and RNA abundance of nirS was found with increasing depth under the suboxic and anoxic conditions, and no expression was observed at oxic conditions. Gammaproteobacteria, Planctomycetia, Deltaproteobacteria and Alphaproteobacteria groups predominante the overall nirS expression in the Saanich Inlet. It was found that the distribution of nirS throughout the water column may be due to Planctomycetia expressing anammox-type nirS that thrives in lower suboxic zones, and Gammaproteobacteria expressing denitrification-type nirS which is highly suitable for denitrification in anoxic zones and upper suboxic zones. In addition, nirS OTUs have been shown to be station and time-dependent. As the Saanich Inlet is a seasonally anoxic fjord, the DNA and RNA abundances for denitrification genes should fluctuate with time as the water column moves in and out of anoxia. In this project, we investigate the nirS gene in the denitrification pathway to assess in terms of taxonomic rank, abundance and expression along the redoxcline in Saanich Inlet and propose future directions to expand on our results.
Saanich Inlet acts as an excellent model system for oxygen minimum zones (OMZs), due to its seasonally anoxic character. A shallow sill separates the body of the inlet from the Strait of Georgia, and traps deep water throughout the spring and summer. Microbial activity quickly depletes dissolved oxygen below 100m, when large amounts of organic matter sink during the spring phytoplankton bloom. Then, during turbulent winter weather, the mixed layer depth increases and the anoxic deep water is exchanged for new, oxygen rich water over the shallow sill [1]. These processes repeat annually, resulting in a cycle of deep water replenishment, followed by the development of an anoxic region. The predictable nature of changing oxygen concentrations, combined with excellent sampling conditions make Saanich Inlet a first-rate model OMZ ecosystem.
The expansion of OMZs is expected to cause drastic shifts in global marine microbial community compositions, and therefore their functional capabilities. There is a demand for models of how OMZs form due to a global increase in OMZ size and intensity [2]. OMZs are bands of hypoxic water between 100m and 1000m in depth. Microbial respiration fueled by sinking organic matter from the euphotic zone depletes dissolved oxygen all the way to anoxia in extreme circumstances. The strength and depth range of these zones has been shown to be increasing on average due to anthropogenic climate change. Oxygen is not only a vital nutrient for macroscopic life, but also a determining factor in how microbial communities function. When oxygen is not available, bacteria can reduce other chemicals for energy production. In the absence of oxygen, nitrate is often the first choice.
Nitrogen is cycled biologically on an immense scale globally. It is a necessary component of all life in relatively large amounts. For practical reasons, nitrogen is often divided into two main pools: fixed nitrogen, which is soluble in water and bioavailable, and unfixed nitrogen, which is in a gaseous form and diffuses to the atmosphere. Cycling between these two pools occurs by three main processes. Nitrogen fixation is the conversion of N2 gas to NH4+ by diazotrophs. In the ocean, diazotrophs are mostly cyanobacteria, which require high light and iron availability. This limits nitrogen fixation to relatively small portions of the surface ocean. The two processes that counter nitrogen fixation, releasing fixed nitrogen back to the atmosphere, are annamox and denitrification. Annamox converts NO3- and NH4+ to N2 but is thought to contribute less to bioavailable nitrogen loss than denitrification. Denitrification is the stepwise reduction of NO3- to N2. Both annamox and denitrification require extremely low O2 environments and are prevalent in oceanic OMZs.
The ability of a community to denitrify is dependant on the genetic presence of a full nitrogen reduction pathway. Often, microbes will not contain the genes or transcribe the genes necessary for all intermediate steps of a metabolic pathway, instead opting to rely on other organisms for the processing of the steps they lack [3]. NirS is of particular interest because it catalyzes the conversion of nitrite (NO2-) to nitric oxide (NO) (Fig 1), the point of no return in the denitrification pathway where a bioavailable ion is converted to a gas. Here the distribution of nirS genes and transcripts across depths and across taxa is analysed in a representative summer Saanich Inlet water column.
Figure 1. Nitrogen Cycle
Environmental samples were collected through time-series monitoring in Saanich Inlet on a monthly-basis aboard the MSV John Strickland at station S3 (48°35.500 N, 123°30.300 W) [4, 5]. Samples for large volume (LV) SSU rRNA gene tags, metagenomics, metatranscriptomics, and metaproteomics were taken from six depths spanning the oxycline (10, 100, 120, 135, 150, 165, and 200 m). Samples for high-resolution (HR) SSU rRNA gene tag sequencing were taken at 16 depths along the oxycline (10, 20, 40, 60, 75, 80, 90, 97, 100, 110, 120, 135, 150, 165, 185 and 200 meters). Water was collected with Go-Flo bottles and dissolved gas samples were taken immediately after the bottles were taken onboard the ship. Next, the remaining water was quickly drained into 2L Nalgene bottles through sterilized silicon tubing to minimise changes in microbial gene expression.A 0.22 μm Sterivex filter was used to collect biomass from water samples. Genomic DNA and RNA were then extracted from cells on these filters. RNA was reverse transcribed into cDNA and both genomic DNA and cDNA were used in the construction of shotgun Illumina libraries. Sequencing data were generated on the Illumina HiSeq platform with 2x150bp technology.
The resulting reads were processed and quality controlled at the Joint Genome Institute using the IMG/M pipeline. Metagenomes were assembled and processed using MetaPathways 2.5 at UBC.
Tree-based Sensitive and Accurate Protein Profiler (TreeSAPP) was implemented for automated reconstruction of the nitrogen cycle along defined redox gradients in Saanich Inlet using Google Cloud services for metagemomic (MetG, DNA) and metatranscriptomic (MetaT, RNA) assemblies of nirS.
Reads per kilo base per million mapped reads (RPKM) was used to quantify gene expression by normalizing for total read length and the number of sequencing reads when comparing the gene coverage values. This corrects the difference in sampling sequencing depth and gene length [6]. The formula below indicates numReads as the number of reads mapped to a gene sequence, geneLength as the length of the gene sequence and totalNumReads as the total number of mapped reads for a sample:
\[RPKM = \frac{numReads}{\frac{geneLength}{10^3}\times \frac{totalNumReads}{10^6}}\]
The following commands were used in the pipeline:
#!/bin/bash
cd /home/enenn/
export WISECONFIGDIR=/home/connor/TreeSAPP//data//genewise_support_files//wisecfg
time treesapp.py -T 8 --verbose --delete -t D0302 -p pe \
-i bucket/MetaG_assemblies/SI072_LV_10m_DNA.scaffolds.fasta --rpkm \
-r bucket/MetaG_reads/SI072_LV_10m.anqdp.fastq.gz -o treesapp/treesapp_out_G_10m
rm treesapp/treesapp_out_G_10m/RPKM_outputs/*.sam &
chmod -Rf treesapp/treesapp_out_G_10m 777 &
time treesapp.py -T 8 --verbose --delete -t D0302 -p pe \
-i bucket/MetaG_assemblies/SI072_LV_10m_DNA.scaffolds.fasta --rpkm \
-r bucket/MetaT_reads/SI072_LV_10m.qtrim.3ptrim.artifact.rRNA.clean.fastq.gz \
-o treesapp/treesapp_out_T_10m
rm treesapp/treesapp_out_T_10m/RPKM_outputs/*.sam &
chmod -Rf treesapp/treesapp_out_T_10m 777 &
time treesapp.py -T 8 --verbose --delete -t D0302 -p pe \
-i bucket/MetaG_assemblies/SI072_LV_100m_DNA.scaffolds.fasta --rpkm \
-r bucket/MetaG_reads/SI072_LV_100m.anqdp.fastq.gz -o treesapp/treesapp_out_G_100m
rm treesapp/treesapp_out_G_100m/RPKM_outputs/*.sam &
chmod -Rf treesapp/treesapp_out_G_100m 777 &
time treesapp.py -T 8 --verbose --delete -t D0302 -p pe \
-i bucket/MetaG_assemblies/SI072_LV_100m_DNA.scaffolds.fasta --rpkm \
-r bucket/MetaT_reads/SI072_LV_100m.qtrim.3ptrim.artifact.rRNA.clean.fastq.gz \
-o treesapp/treesapp_out_T_100m
rm treesapp/treesapp_out_T_100m/RPKM_outputs/*.sam &
chmod -Rf treesapp/treesapp_out_T_100m 777 &
time treesapp.py -T 8 --verbose --delete -t D0302 -p pe \
-i bucket/MetaG_assemblies/SI072_LV_120m_DNA.scaffolds.fasta --rpkm \
-r bucket/MetaG_reads/SI072_LV_120m.anqdp.fastq.gz -o treesapp/treesapp_out_G_120m
rm treesapp/treesapp_out_G_120m/RPKM_outputs/*.sam &
chmod -Rf treesapp/treesapp_out_G_120m 777 &
time treesapp.py -T 8 --verbose --delete -t D0302 -p pe \
-i bucket/MetaG_assemblies/SI072_LV_120m_DNA.scaffolds.fasta --rpkm \
-r bucket/MetaT_reads/SI072_LV_120m.qtrim.3ptrim.artifact.rRNA.clean.fastq.gz \
-o treesapp/treesapp_out_T_120m
rm treesapp/treesapp_out_T_120m/RPKM_outputs/*.sam &
chmod -Rf treesapp/treesapp_out_T_120m 777 &
time treesapp.py -T 8 --verbose --delete -t D0302 -p pe \
-i bucket/MetaG_assemblies/SI072_LV_135m_DNA.scaffolds.fasta --rpkm \
-r bucket/MetaG_reads/SI072_LV_135m.anqdp.fastq.gz -o treesapp/treesapp_out_G_135m
rm treesapp/treesapp_out_G_135m/RPKM_outputs/*.sam &
chmod -Rf treesapp/treesapp_out_G_135m 777 &
time treesapp.py -T 8 --verbose --delete -t D0302 -p pe \
-i bucket/MetaG_assemblies/SI072_LV_135m_DNA.scaffolds.fasta --rpkm \
-r bucket/MetaT_reads/SI072_LV_135m.qtrim.3ptrim.artifact.rRNA.clean.fastq.gz \
-o treesapp/treesapp_out_T_135m
rm treesapp/treesapp_out_T_135m/RPKM_outputs/*.sam &
chmod -Rf treesapp/treesapp_out_T_135m 777 &
time treesapp.py -T 8 --verbose --delete -t D0302 -p pe \
-i bucket/MetaG_assemblies/SI072_LV_150m_DNA.scaffolds.fasta --rpkm \
-r bucket/MetaG_reads/SI072_LV_150m.anqdp.fastq.gz -o treesapp/treesapp_out_G_150m
rm treesapp/treesapp_out_G_150m/RPKM_outputs/*.sam &
chmod -Rf treesapp/treesapp_out_G_150m 777 &
time treesapp.py -T 8 --verbose --delete -t D0302 -p pe \
-i bucket/MetaG_assemblies/SI072_LV_150m_DNA.scaffolds.fasta --rpkm \
-r bucket/MetaT_reads/SI072_LV_150m.qtrim.3ptrim.artifact.rRNA.clean.fastq.gz \
-o treesapp/treesapp_out_T_150m
rm treesapp/treesapp_out_T_150m/RPKM_outputs/*.sam &
chmod -Rf treesapp/treesapp_out_T_150m 777 &
time treesapp.py -T 8 --verbose --delete -t D0302 -p pe \
-i bucket/MetaG_assemblies/SI072_LV_165m_DNA.scaffolds.fasta --rpkm \
-r bucket/MetaG_reads/SI072_LV_165m.anqdp.fastq.gz -o treesapp/treesapp_out_G_165m
rm treesapp/treesapp_out_G_165m/RPKM_outputs/*.sam &
chmod -Rf treesapp/treesapp_out_G_165m 777 &
time treesapp.py -T 8 --verbose --delete -t D0302 -p pe \
-i bucket/MetaG_assemblies/SI072_LV_165m_DNA.scaffolds.fasta --rpkm \
-r bucket/MetaT_reads/SI072_LV_165m.qtrim.3ptrim.artifact.rRNA.clean.fastq.gz \
-o treesapp/treesapp_out_T_165m
rm treesapp/treesapp_out_T_165m/RPKM_outputs/*.sam &
chmod -Rf treesapp/treesapp_out_T_165m 777 &
time treesapp.py -T 8 --verbose --delete -t D0302 \
-i bucket/MetaG_assemblies/SI072_LV_200m_DNA.scaffolds.fasta --rpkm \
-r bucket/MetaG_reads/SI072_LV_200m.anqdp.fastq.gz -o treesapp/treesapp_out_G_200m
rm treesapp/treesapp_out_G_200m/RPKM_outputs/*.sam &
chmod -Rf treesapp/treesapp_out_G_200m 777 &
time treesapp.py -T 8 --verbose --delete -t D0302 \
-i bucket/MetaG_assemblies/SI072_LV_200m_DNA.scaffolds.fasta --rpkm \
-r bucket/MetaT_reads/SI072_LV_200m.qtrim.3ptrim.artifact.rRNA.clean.fastq.gz \
-o treesapp/treesapp_out_T_200m
rm treesapp/treesapp_out_T_200m/RPKM_outputs/*.sam &
chmod -Rf treesapp/treesapp_out_T_200m 777
Interactive Tree of Life (iTOL) version 4.2 was used for the display, annotation and management of the phylogenetic trees produced for the nirS gene within the denitrification pathway [7].
DNA and RNA abundances of nirS were compared across depth and different taxa. Analysis was completed in R v3.4.3 [8] using the following packages and the TreeSAPP output was loaded into R and combined into a single dataframe, nirS.all using the following commands:
library(tidyverse)
library(cowplot)
##
## Attaching package: 'cowplot'
## The following object is masked from 'package:ggplot2':
##
## ggsave
#DNA data
nirS.DNA.10m = read_tsv("nirS_DNA_10m_marker_contig_map.tsv") %>%
select(Tax.DNA.10 = Confident_Taxonomy, Abund.DNA.10 = Abundance, Query)
## Parsed with column specification:
## cols(
## Query = col_character(),
## Marker = col_character(),
## Taxonomy = col_character(),
## Confident_Taxonomy = col_character(),
## Abundance = col_character(),
## Likelihood = col_character(),
## WTD = col_character()
## )
nirS.DNA.100m = read_tsv("nirS_DNA_100m_marker_contig_map.tsv") %>%
select(Tax.DNA.100 = Confident_Taxonomy, Abund.DNA.100 = Abundance, Query)
## Parsed with column specification:
## cols(
## Query = col_character(),
## Marker = col_character(),
## Taxonomy = col_character(),
## Confident_Taxonomy = col_character(),
## Abundance = col_double(),
## Likelihood = col_double(),
## WTD = col_double()
## )
nirS.DNA.120m = read_tsv("nirS_DNA_120m_marker_contig_map.tsv") %>%
select(Tax.DNA.120 = Confident_Taxonomy, Abund.DNA.120 = Abundance, Query)
## Parsed with column specification:
## cols(
## Query = col_character(),
## Marker = col_character(),
## Taxonomy = col_character(),
## Confident_Taxonomy = col_character(),
## Abundance = col_double(),
## Likelihood = col_double(),
## WTD = col_double()
## )
nirS.DNA.135m = read_tsv("nirS_DNA_135m_marker_contig_map.tsv") %>%
select(Tax.DNA.135 = Confident_Taxonomy, Abund.DNA.135 = Abundance, Query)
## Parsed with column specification:
## cols(
## Query = col_character(),
## Marker = col_character(),
## Taxonomy = col_character(),
## Confident_Taxonomy = col_character(),
## Abundance = col_double(),
## Likelihood = col_double(),
## WTD = col_double()
## )
nirS.DNA.150m = read_tsv("nirS_DNA_150m_marker_contig_map.tsv") %>%
select(Tax.DNA.150 = Confident_Taxonomy, Abund.DNA.150 = Abundance, Query)
## Parsed with column specification:
## cols(
## Query = col_character(),
## Marker = col_character(),
## Taxonomy = col_character(),
## Confident_Taxonomy = col_character(),
## Abundance = col_double(),
## Likelihood = col_double(),
## WTD = col_double()
## )
nirS.DNA.165m = read_tsv("nirS_DNA_165m_marker_contig_map.tsv") %>%
select(Tax.DNA.165 = Confident_Taxonomy, Abund.DNA.165 = Abundance, Query)
## Parsed with column specification:
## cols(
## Query = col_character(),
## Marker = col_character(),
## Taxonomy = col_character(),
## Confident_Taxonomy = col_character(),
## Abundance = col_double(),
## Likelihood = col_double(),
## WTD = col_double()
## )
nirS.DNA.200m = read_tsv("nirS_DNA_200m_marker_contig_map.tsv") %>%
select(Tax.DNA.200 = Confident_Taxonomy, Abund.DNA.200 = Abundance, Query)
## Parsed with column specification:
## cols(
## Query = col_character(),
## Marker = col_character(),
## Taxonomy = col_character(),
## Confident_Taxonomy = col_character(),
## Abundance = col_double(),
## Likelihood = col_double(),
## WTD = col_double()
## )
#RNA data
nirS.RNA.10m = read_tsv("nirS_RNA_10m_marker_contig_map.tsv") %>%
select(Tax.RNA.10 = Confident_Taxonomy, Abund.RNA.10 = Abundance, Query)
## Parsed with column specification:
## cols(
## Query = col_character(),
## Marker = col_character(),
## Taxonomy = col_character(),
## Confident_Taxonomy = col_character(),
## Abundance = col_character(),
## Likelihood = col_character(),
## WTD = col_character()
## )
nirS.RNA.100m = read_tsv("nirS_RNA_100m_marker_contig_map.tsv") %>%
select(Tax.RNA.100 = Confident_Taxonomy, Abund.RNA.100 = Abundance, Query)
## Parsed with column specification:
## cols(
## Query = col_character(),
## Marker = col_character(),
## Taxonomy = col_character(),
## Confident_Taxonomy = col_character(),
## Abundance = col_double(),
## Likelihood = col_double(),
## WTD = col_double()
## )
nirS.RNA.120m = read_tsv("nirS_RNA_120m_marker_contig_map.tsv") %>%
select(Tax.RNA.120 = Confident_Taxonomy, Abund.RNA.120 = Abundance, Query)
## Parsed with column specification:
## cols(
## Query = col_character(),
## Marker = col_character(),
## Taxonomy = col_character(),
## Confident_Taxonomy = col_character(),
## Abundance = col_double(),
## Likelihood = col_double(),
## WTD = col_double()
## )
nirS.RNA.135m = read_tsv("nirS_RNA_135m_marker_contig_map.tsv") %>%
select(Tax.RNA.135 = Confident_Taxonomy, Abund.RNA.135 = Abundance, Query)
## Parsed with column specification:
## cols(
## Query = col_character(),
## Marker = col_character(),
## Taxonomy = col_character(),
## Confident_Taxonomy = col_character(),
## Abundance = col_double(),
## Likelihood = col_double(),
## WTD = col_double()
## )
nirS.RNA.150m = read_tsv("nirS_RNA_150m_marker_contig_map.tsv") %>%
select(Tax.RNA.150 = Confident_Taxonomy, Abund.RNA.150 = Abundance, Query)
## Parsed with column specification:
## cols(
## Query = col_character(),
## Marker = col_character(),
## Taxonomy = col_character(),
## Confident_Taxonomy = col_character(),
## Abundance = col_double(),
## Likelihood = col_double(),
## WTD = col_double()
## )
nirS.RNA.165m = read_tsv("nirS_RNA_165m_marker_contig_map.tsv") %>%
select(Tax.RNA.165 = Confident_Taxonomy, Abund.RNA.165 = Abundance, Query)
## Parsed with column specification:
## cols(
## Query = col_character(),
## Marker = col_character(),
## Taxonomy = col_character(),
## Confident_Taxonomy = col_character(),
## Abundance = col_double(),
## Likelihood = col_double(),
## WTD = col_double()
## )
nirS.RNA.200m = read_tsv("nirS_RNA_200m_marker_contig_map.tsv") %>%
select(Tax.RNA.200 = Confident_Taxonomy, Abund.RNA.200 = Abundance, Query)
## Parsed with column specification:
## cols(
## Query = col_character(),
## Marker = col_character(),
## Taxonomy = col_character(),
## Confident_Taxonomy = col_character(),
## Abundance = col_double(),
## Likelihood = col_double(),
## WTD = col_double()
## )
#Combine all data frames
nirS.all = nirS.DNA.100m %>%
full_join(nirS.DNA.120m, by = "Query") %>%
full_join(nirS.DNA.135m, by = "Query") %>%
full_join(nirS.DNA.150m, by = "Query") %>%
full_join(nirS.DNA.165m, by = "Query") %>%
full_join(nirS.DNA.200m, by = "Query") %>%
full_join(nirS.RNA.100m, by = "Query") %>%
full_join(nirS.RNA.120m, by = "Query") %>%
full_join(nirS.RNA.135m, by = "Query") %>%
full_join(nirS.RNA.150m, by = "Query") %>%
full_join(nirS.RNA.165m, by = "Query") %>%
full_join(nirS.RNA.200m, by = "Query") %>%
mutate(Taxonomy = ifelse(!is.na(Tax.DNA.100), Tax.DNA.100,
ifelse(!is.na(Tax.DNA.120), Tax.DNA.120,
ifelse(!is.na(Tax.DNA.135), Tax.DNA.135,
ifelse(!is.na(Tax.DNA.150), Tax.DNA.150,
ifelse(!is.na(Tax.DNA.165), Tax.DNA.165,
ifelse(!is.na(Tax.DNA.200), Tax.DNA.200,
ifelse(!is.na(Tax.RNA.100), Tax.RNA.100,
ifelse(!is.na(Tax.RNA.120), Tax.RNA.120,
ifelse(!is.na(Tax.RNA.135), Tax.RNA.135,
ifelse(!is.na(Tax.RNA.150), Tax.RNA.150,
ifelse(!is.na(Tax.RNA.165), Tax.RNA.165,
ifelse(!is.na(Tax.RNA.200), Tax.RNA.200,
"unclassified"))))))))))))) %>%
select(-starts_with("Tax.")) %>%
gather("Key", "Abundance", starts_with("Abund")) %>%
separate(Key, c("Key","Type","Depth_m"), by = ".") %>%
select(Depth_m, Type, Abundance, Taxonomy, Query) %>%
mutate(Depth_m = as.numeric(Depth_m)) %>%
separate(Taxonomy, into = c("Domain", "Phylum", "Class", "Order",
"Family","Genus", "Species"), sep=";")
## Warning: package 'bindrcpp' was built under R version 3.4.4
## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 1428
## rows [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
## 20, ...].
Figure 2. DNA vs. RNA relative abundance of nirS gene and transcripts from 10 to 200 metres depth in Saanich Inlet.
As shown in Figure 2, the DNA and RNA abundances of the nirS gene vary significantly by depth, with the general trend being that abundance increases with depth below 100m. Above 100m, the only depth that was measured was 10m, and there was neither DNA nor RNA found for the nirS gene. The abundance of DNA is higher than that of RNA, down to 150m, where RNA abundance increases significantly by about one order of magnitude from 82 to 750. This means that transcription of nitrite reductase, nirS, increases under anoxic conditions, which start around 150m. RNA abundance reaches a maximum at 165m, but the DNA abundance decreases by a sizeable amount at this depth, which could mean that there is lower gene diversity, but high rates of transcription in each organism, leading to more RNA. DNA abundance reaches a maximum at 200m, and RNA abundance decreases slightly, which means that there could be more diversity, but lower transcription rates for each organism. Overall, RNA abundances are significantly higher than DNA abundance below 150m, but above this depth, DNA abundances are higher, meaning that the nirS gene is present in cells, but it is not transcribed at high rates.
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font
## metrics unknown for Unicode character U+03bc
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): conversion
## failure on ' (μM)' in 'mbcsToSbcs': dot substituted for <ce>
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): conversion
## failure on ' (μM)' in 'mbcsToSbcs': dot substituted for <bc>
## Warning: Removed 1205 rows containing missing values (geom_point).
Figure 3. A: Change in nitrite, nitrate, and oxygen concentrations (Log10 scale) from 10 to 200 metres depth in Saanich Inlet. B: Change in DNA vs. RNA abundance of the nirS gene from 10 to 200 metres depth across different Orders in Saanich Inlet. Note that the Log10 was used for concentration as the nitrite concentration was relatively low compared to nitrate and oxygen.
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font
## metrics unknown for Unicode character U+03bc
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): conversion
## failure on ' (μM)' in 'mbcsToSbcs': dot substituted for <ce>
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): conversion
## failure on ' (μM)' in 'mbcsToSbcs': dot substituted for <bc>
## Warning: Removed 1205 rows containing missing values (geom_point).
Figure 4. A: Change in nitrite, nitrate, and oxygen concentrations (Log10 scale) from 10 to 200 metres depth in Saanich Inlet. B: Change in DNA vs. RNA abundance of the nirS gene from 10 to 200 metres depth across different Classes in Saanich Inlet. Note that the Log10 was used for concentration as the nitrite concentration was relatively low compared to nitrate and oxygen.
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font
## metrics unknown for Unicode character U+03bc
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): conversion
## failure on ' (μM)' in 'mbcsToSbcs': dot substituted for <ce>
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): conversion
## failure on ' (μM)' in 'mbcsToSbcs': dot substituted for <bc>
## Warning: Removed 1205 rows containing missing values (geom_point).
Figure 5. A: Change in nitrite, nitrate, and oxygen concentrations (Log10 scale) from 10 to 200 metres depth in Saanich Inlet. B: Change in DNA vs. RNA abundance of the nirS gene from 10 to 200 metres depth across different Families in Saanich Inlet. Note that the Log10 was used for concentration as the nitrite concentration was relatively low compared to nitrate and oxygen.
Figure 6. Relative abundances of Class at various depths in Saanich Inlet. The most abundant class with 90%-100% relative abundance at all depth are Acidlmicroblia, and Actinobacteria, The intermediate abundance of class between 53%-85% relative abundance are mainly unclassified Archaea, Deltaproteobacteria, Epsilonproteobacteria, Gammaproteobacteria. The Phyla with the lowest relative abundances with approximately 2%-20% are mainly Marinimicrobia, Planctomycetes, Varrucomicrobia and unclassified Proteobacteria.
Figure 7. nirS relative DNA and RNA expression in 100m, 120m, 135m, 150m, 165m, 200m depth relative to taxa. Taxonomic classification of organisms are divided by different colours, shown in a wheel at the end of the tree branch. The green and red bars extending out from the organism names depict relative DNA and RNA abundances of nirS gene, respectively. 10m is not shown as there were no DNA or RNA expression of nirS for all analyzed organisms.
As shown in Figures 4 and 7, diversity of organisms containing the nirS gene varies with depth. Gammaproteobacteria genera Colwellia, Oleispira, Olephilus, Thalassomonas, Ridgeia and Sedimenticola have consistent relative level of DNA expression compared to other organisms at all analyzed depths. At 100m, the level of RNA transcription follows the trend of DNA expression in members of the Gammaproteobacteria, Planctomycetia and Deltaproteobacteria class. At 120m, the RNA transcripts of nirS for Gammaproteobacteria are less than DNA. However, the inverse is true for some members of the Alphaproteobacteria, Anaerolineae, Planctomycetia, Deltaproteobacteria, Deinococci, Bacteroidetes, Aquificae, Epsilonproteibacteria, Aciditiobacillia classes, in which the RNA transcription level is much higher than DNA expression. Of note, Planctomycemia have the highest RNA abundance. At 135m, the same trend in relative DNA and RNA is observed as at 120m. At 150m, the O2 concentration decreases to zero, there is high nirS expression in Gammaproteobacteria, and medium to low expression in Alphaproteobacteria. RNA levels follows the trend of DNA expression. At 165m, NO3 concentration also decreases to zero. NirS expression decreases in Alphaproteobacteria, but slightly increases in Planctomycetia and Deltaproteobacteria. RNA abundances still follows the trend of DNA. At 200m, the O2, NO3 and NO2 concentrations are zero. RNA is lower than DNA in all previously mentioned Gammaproteobacteria species except for Olephilus. However, RNA abundance is slightly higher than DNA in the Anaerolineae class. We must mention that the systemic nirS expression observed in all species at 135m, 150m and 200m is an artifact. There is an organism identified at a higher taxonomic level with nirS expression, but the pipeline cannot differentiate it further. Thus, nirS expression was conferred to all species under that taxonomic level.
Denitrification is a sequential, four-step process, nitrate (NO3-), a fixed inorganic species, is reduced to dinitrogen gas (N2). This process only occurs under suboxic and anoxic conditions (1-2% oxygen saturation), where the most favourable terminal electron acceptor (TEA), oxygen (O2) is limiting and organisms take up the next favourable TEA, NO3- [9,10]. Our gene of interest, nirS; nitrite reductase, is involved in the reduction of nitrite (NO2-) to nitric oxide (NO) in the second step of denitrification and also contains cytochrome cd1 [11, 12]. Within the water column of Saanich Inlet, dissolved oxygen concentrations decrease steadily with depth until 150m, where values quickly approach zero (Figures 3A, 4A, 5A.). This results to a sharp increase in DNA and RNA abundance, consistent with the fact that denitrification happens under suboxic and anoxic conditions (Figures 3A, 4A, 5A.).
This nitrite reduction process is shared with another nitrite reductase, nirK, which contains copper instead of the cytochrome cd1 used in nirS [11]. Gene analyses from experiments done in other other anoxic/suboxic basins, such as the Baltic Sea, Black Sea and the San Francisco Bay, have shown that gene abundances were about 1-2 orders of magnitude higher for nirS than in nirK in the San Francisco Bay [13]. OTU abundances based on a 95% sequence similarity cutoff for all experiments were also higher for nirS compared to nirK by up to five times at certain stations in the Baltic Sea and the San Francisco Bay [13,14]. However, in the Black Sea, the OTU abundance of nirS is either approximately the same, or up to five times lower than the abundance of nirK [15]. Alpha-diversity and richness of nirS and nirK OTUs are largely dependent on the station and time of year, which is seen in all three experiments [13,14,15].
Our TreeSAPP analysis showed that the Gammaproteobacteria, Planctomycetia, Deltaproteobacteria and Alphaproteobacteria groups predominante the overall nirS expression in the Saanich Inlet. There are two noteworthy points to highlight from the results. First, we observed that the Gammaproteobacteria is present in the Saanich Inlet at relatively high abundances of 50-85% at all depths (Figure 6.), which corresponds to also having the highest relative abundance of nirS gene copies at all analyzed depths (Figure 7.). This observation suggests a high level of denitrifying metabolic activity within this group. Studies with pure cultures and seawater enrichments have shown that the Gammaproteobacteria may maintain a higher ribosome content per cell [16,17], with which the increased rRNA levels may permit these bacteria to better respond to rapid changes in oxygen concentration in the seasonally anoxic Saanich Inlet [18]. Secondly, despite Gammaproteobacteria exhibiting relatively high level of nirS DNA, lower RNA abundance was detected at 120m and 135m, where non-Proteobacteria groups such as Planctomycetia dominate (Figure 7.). It is also worthy to note that Planctomycetia composes less than 10% of overall organism abundance at these depths (Figure 6.), but disproportionality accounts for a large portion of nirS expression (Figure 7.). Planctomycetia is known to mainly carry out anammox metabolism [19]. Gene-sequencing studies investigating expression of anammox sequences have shown that nirS genes expressed consistently at lower suboxic zones, such as 120m and 135m in Saanich, are closely allied to the environmental anammox clade [20]. The presence of anammox bacteria in the lower suboxic zone may be due to the flux of ammonium from the sulfide zone that affects these depths [20].
Meanwhile, Gammaproteobacteria may express nirS that is more closely related to denitrification pathways, which studies have shown to be more abundant in anoxic zones, and also in upper suboxic zones (O2 > 10 µM) [20]. While anoxic environments are the most suitable for denitrification, studies have also suspected that the enzymatic machinery in Alpha- and Gammaproteobacteria for anaerobic denitrification is likely to be similar to aerobic denitrification, allowing possible denitrification in upper suboxic zones [21]. In other words, the distribution of nirS throughout the water column may be due to Planctomycetia expressing anammox-type nirS that thrives in lower suboxic zones, and Gammaproteobacteria expressing denitrification-type nirS which is highly suitable for denitrification in anoxic zones and upper suboxic zones.
In this investigation, we have seen the abundance and expression of nirS vary with taxa and location. This is one of a number of genes involved in the distributed metabolism of the N cycle. It is likely advantageous for the genes in the N cycle to be specialized in order to interact with the many forms nitrogen can adopt. It is also advantageous to distribute these genes across specific taxa in the community since this increases the adaptability of microbial communities in response to their ecosystem. In addition, transitions in the oxidation state of nitrogen occur as a result of thermodynamically driven microbial reactions. It would be energetically irrational for an organism inhabiting anoxic regions of the ocean to actively express genes involved in nitrification. Microbial communities involved in the N cycle have evolved in symbiosis to achieve energetically efficient niches. For example, consider the diazotrophic cyanobacterium UCYN-A. UCYN-A is lacking photosystem II, RubisCo and the tricarboxylic acid cycle, meaning it is not capable of fixing carbon. Instead, it has formed a symbiotic relationship with a prymnesiophyte where UCYN-A receives fixed carbon in exchange for nitrogen [22]. Until the identification of UCYN-A, nitrogen fixation in the oceans had been attributed mainly to Trichodesmium [23]. Further investigation will likely yield even more discoveries indicating the vastness of the N cycle. Ecological studies of microbial community dynamics could provide further insight into the functioning N cycle.
It should be noted that some genes in the nitrogen cycle have similar functions. For example, the nirK gene also performs nitrite reduction, like nirS. As alpha-diversity and richness of nirS and nirK OTUs have been shown to be station and time-dependent [13,14,15], this should also apply to Saanich Inlet, as it is a seasonally anoxic fjord, which can affect the distribution of denitrification pathways at different parts of the year as oxygen concentrations in the water column change. As a result, DNA and RNA abundances for denitrification genes should fluctuate with time as the water column moves in and out of anoxia. Accordingly, an avenue for future research would be to compare nirS abundance to nirK abundance, by obtaining the nirK dataset using TreeSAPP.
Another method of microbial NO2- loss in the water column is through anaerobic ammonium oxidation (anammox), which converts either NO2- or NH4+ to dinitrogen gas (N2) using either the Hao (hydroxilamine reductase) or Hzo (hydroxylamine oxidoreductase) gene [12]. Autotrophic bacteria have been shown to undergo anammox [24], meaning that they could potentially have a selective advantage where organic carbon flux from the surface is limited, given that they can produce their own organic carbon. In addition, there are also bacteria that can use an alternative organic carbon source, propionate, for growth [25]. Although bacteria that undergo anammox are slow-growing [26], these alternative source of organic carbon, means that they can potentially have a competitive advantage over nitrite reducers in certain areas of the ocean. Although our analysis is primarily focused on denitrification pathways, comparing nirS and nirK abundance with Hao and Hzo abundances would help determine if annamox has an effect on denitrification rates in Saanich Inlet.